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The  increasing  complexity  of  coupled  hydrodynamic-ecosystem  models  may  require  skill 
assessment  methods  that  both  quantify  various  aspects  of  model  performance  and  visually 
summarize  these  aspects  within  compact  diagrams.  Hence  summary  diagrams,  such  as  the  Taylor 
diagram  (Taylor  2001,  Journal  of  Geophysical  Research,  106,  D7,  7183-7192).  may  meet  this 
requirement  by  exploiting  mathematical  relationships  between  widely  known  statistical 
quantities  in  order  to  succinctly  display  a  suite  of  model  skill  metrics  in  a  single  plot.  In  this 
paper,  sensitivity  results  from  a  coupled  model  are  compared  with  Sea-viewing  Wide  Field-of- 
view  Sensor  (SeaWiFS)  satellite  ocean  color  data  in  order  to  assess  the  utility  of  the  Taylor  diagram 
and  to  develop  a  set  of  alternatives.  Summary  diagrams  are  only  effective  as  skill  assessment  tools 
insofar  as  the  statistical  quantities  they  communicate  adequately  capture  differentiable  aspects  of 
model  performance.  Here  we  demonstrate  how  the  linear  correlation  coefficients  and  variance 
comparisons  (pattern  statistics)  that  constitute  a  Taylor  diagram  may  fail  to  identify  other 
potentially  important  aspects  of  coupled  model  performance,  even  if  these  quantities  appear  close 
to  their  ideal  values.  An  additional  skill  assessment  tool,  the  target  diagram,  is  developed  in  order 
to  provide  summary  information  about  how  the  pattern  statistics  and  the  bias  (difference  of  mean 
values)  each  contribute  to  the  magnitude  of  the  total  Root-Mean-Square  Difference  (RMSD).  In 
addition,  a  potential  inconsistency  in  the  use  of  RMSD  statistics  as  skill  metrics  for  overall  model 
and  observation  agreement  is  identified:  underestimates  of  the  observed  field’s  variance  are 
rewarded  when  the  linear  correlation  scores  are  less  than  unity.  An  alternative  skill  score  and  skill 
score-based  summary  diagram  is  presented 

Published  by  Elsevier  B.V. 


1.  Introduction 

In  general,  mechanistic  models  that  seek  to  simulate  some 
natural  phenomena  must  invariably  be  compared  to  observa¬ 
tions  in  order  to  assess  the  model’s  skill.  In  accordance  with 
this  special  volume  on  model  skill  assessment,  we  define  skill 
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as  the  model's  fidelity  to  the  truth.  We  further  presume  that 
since  the  truth  cannot  be  known,  assessment  of  model  skill 
must  begin  with  a  quantification  of  the  misfit  between  model 
results  and  imperfect  observations.  An  overview  of  various 
model  skill  metrics,  which  may  include  known  statistical 
quantities  or  novel  functions  and  mathematical  techniques,  is 
given  in  Stow  et  al.  (2009).  In  this  paper,  we  present  a 
pragmatic  evaluation  of  some  widely  known  statistical 
quantities  for  the  purpose  of  model  skill  assessment  as  well 
as  how  relationships  between  these  quantities  may  be 
exploited  to  make  compact  diagrams  that  summarize  multi¬ 
ple  aspects  of  model  performance,  i.e„  summary  diagrams.  An 
important  component  of  this  analysis  is  the  relationship 
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between  various  statistical  quantities,  which  may  be  utilized 
to  produce  summary  diagrams,  but  may  also  be  deceptive  if 
additional  information  is  not  presented.  It  is  the  general  aim 
of  this  paper  to  demonstrate  that  a  comprehensive  and  ba¬ 
lanced  approach  to  quantitative  model  skill  assessment 
should  include,  at  the  very  least,  an  acknowledgement  of 
these  relationships  and  an  understanding  of  how  they  may 
influence  the  appearance  of  model  skill. 

More  specifically,  however,  summary  diagrams  may  be 
particularly  suited  to  the  task  of  skill  assessment  for  spatially 
complex  models  with  multiple  state  variables,  such  as  a 
marine  ecosystem  model  coupled  to  a  hydrodynamic  model 
(coupled  models  —  e.g.,  Franks  and  Chen,  2001;  Gregg  et  al., 
2003,  Walsh  et  al.,  2003;  Holt  et  al.,  2005;  Kindle  et  al..  2005; 
Allen  et  al.,  2007).  Indeed,  summary  diagrams  present  a  useful 
method  to  succinctly  communicate  various  aspects  of  coupled 
model  performance  since  extensive  lists  of  metric  values  in 
tabular  form  may  become  tedious.  In  addition,  the  use  of 
summary  diagrams  should  also  be  encouraged  in  order  to 
address  several  other  practical  and  scientific  concerns.  First, 
many  coupled  model  skill  assessment  exercises  that  have 
appeared  in  recent  literature  still  rely  principally  upon 
graphics  that  emphasize  the  direct  visual  comparisons 
between  model  results  and  observations  (Stow  et  al.,  2009), 
such  as  a  time  series  plot  ora  side-by-side  comparison  of  one 
to  two-dimensional  property  fields  (chlorophyll,  nitrate,  etc.). 
If  the  statistical  and  graphical  techniques  that  are  integral  to 
the  summary  diagram  approach  become  more  widely 
accepted  and  presented,  then  this  may  encourage  more 
quantitative  statements  about  coupled  model  skill.  Second, 
summary  diagrams  are  particularly  useful  for  quantitatively 
comparing  the  performance  of  an  ensemble  of  different 
models  or  multiple  permutations  of  a  single  model.  Given 
that  there  remains  continuing  uncertainly  in  the  structure  and 
parameterization  of  ecosystem  models  (e.g„  Friedrichs  et  al , 
2007),  summary  and  quantitative  skill  assessment  techniques 
may  become  an  efficient  facilitator  of  improved  prognostic 
performance. 

Accordingly,  one  potential  statistical  and  graphical  skill 
assessment  approach  is  to  render  a  Taylor  diagram  (Taylor, 
2001).  Taylor  diagrams  exploit  relationships  between  known 
statistical  quantities  in  order  to  provide  summary  information 
about  particular  aspects  of  model  performance  and  were 
developed  to  aid  in  the  monitoring  of  complex  ocean-atmo¬ 
sphere  climate  models.  The  Taylor  diagram,  as  is  the  case  for 
many  potential  model  skill  assessment  tools,  is  not  discipline 
specific,  and  several  recent  marine  ecosystem  modeling  papers 
have  presented  them  as  part  of  a  model  skill  assessment 
scheme  (Gruber  et  al ,  2006;  Raick  et  al,  2007).  Here  we  begin 
with  an  assessment  of  the  Taylor  diagram  and  the  statistics  it 
communicates  for  the  specific  purpose  of  coupled  model  skill 
assessment.  Taylor  diagrams  are  an  appropriate  place  to  begin 
our  evaluation  of  summary  diagrams  given  their  increasing  use 
in  a  wide  range  of  modeling  disciplines;  however,  summary 
diagrams  are  only  as  useful  as  the  metrics  they  communicate, 
and  so  our  analysis  includes  an  exposition  of  how  relationships 
between  widely  known  statistical  quantities  may  be  further 
utilized  to  construct  other  types  of  summary  diagrams  that 
communicate  additional  aspects  of  model  performance. 

While  the  statistical  methods  and  diagrams  developed  and 
discussed  here  may  potentially  be  applied  to  many  other 


types  of  model  result  to  data  comparisons,  we  nonetheless 
present  results  from  a  coupled  hydrodynamic-ecosystem 
model  and  ocean  color  products  derived  from  SeaWiFS  sate¬ 
llite  ocean  color  data  in  order  to  explicitly  illustrate  potential 
problems  arising  from  this  type  of  skill  assessment.  To  that 
end,  summary  information  about  the  modeling  and  satellite 
ocean  color  methods  is  given  below  (Section  2),  whereas 
detailed  description  of  statistical  methods  and  display 
techniques  are  fully  explicated  in  due  course  of  the  main 
analysis  (Section  3).  In  Section  3.1,  we  examine  the  Taylor 
diagram  and  the  univariate  statistics  it  summarizes  by  pre¬ 
senting  several  example  applications  that  demonstrate  the 
strengths  and  weaknesses  of  this  approach.  In  Section  3.2,  we 
develop  an  alternative  summary  diagram,  the  target  diagram, 
which  provides  information  about  additional  aspects  of 
model  performance  that  may  be  of  particular  concern  to  the 
skill  assessment  of  ecosystem  models.  In  Section  3.3.  we 
identify  a  potentially  undesirable  property  of  RMSD-based 
metrics,  and  present  an  alternative  skill  score  and  skill  score- 
based  summary  diagram. 

2.  Methods 

Results  from  an  experimental  ecosystem  modeling  envir¬ 
onment,  the  Naval  Research  Laboratory  Ecological-Photoche- 
mical-Bio-Optical-Numencal  Experiment  (which  for  brevity 
is  referred  to  as  Neptune),  are  presented  here  as  a  prototypical 
example  of  a  complex  modeling  system.  Detailed  description 
of  the  Neptune  modeling  construct,  including  all  state  equa¬ 
tions,  parameter  designations,  and  optical  calculations,  may 
be  found  in  Jolliff  and  Kindle  (2007).  The  modeling  system  is 
composed  of  four  core  elements;  (1)  the  biogeochemical 
model  that  describes  the  flow  and  transformation  of  ele¬ 
mental  reservoirs  (carbon,  nitrogen,  and  phosphorus)  as  a 
result  of  phytoplankton  primary  production  and  subsequent 
physiological  processes  and  trophic  interactions;  (2)  a  visible 
optics  module  that  relates  the  biogeochemical  elemental 
reservoirs  to  spectrally  explicit  optical  properties,  describes 
the  vertically  resolved  attenuation  of  incident,  spectrally  de¬ 
composed  irradiance,  and  budgets  photons  absorbed  by  living 
phytoplankton  to  perform  light-growth  calculations;  (3)  an 
ultraviolet  (UV)  optics  module  that  determines  the  attenua¬ 
tion  of  spectrally  decomposed  UV  irradiance  and  the  potential 
UV-stimulated  photochemical  degradation  of  colored  dis¬ 
solved  organic  matter  (CDOM);  and  (4)  a  description  of  the 
spectrally  decomposed  UV  and  visible  irradiance  boundary 
conditions. 

The  Neptune  system  is  designed  for  integration  with  any 
hydrodynamic  model  capable  of  describing  the  advection- 
diffusion  of  state  variables.  Here  we  examine  the  one-dimen¬ 
sional  case  by  coupling  the  model  to  the  Modular  Ocean  Data 
Assimilation  System  (MODAS).  MODAS  is  described  in  Fox  et  al. 
(2002).  Briefly,  the  system  uses  optimal  interpolation  (Breth- 
erton  et  al.,  1976)  to  render  daily  satellite  estimates  of  sea 
surface  temperature  (SST)  and  sea  surface  height  (SSH)  onto  a 
two-dimensional  grid.  A  subsurface  temperature  profile  is  then 
retrieved  from  the  U.S.  Navy’s  Master  Oceanographic  Observa¬ 
tional  Data  Set  Deviation  from  subsurface  climatology  is  then 
estimated  based  upon  SST  and  SSH  deviation  from  surface 
climatology.  The  result  is  a  synthetic  three-dimensional  tem¬ 
perature  field 
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The  MODAS  fields  were  averaged  over  4  years  (2001  — 
2004)  to  approximate  an  average  annual  cycle  of  summer 
thermal  stratification  followed  by  winter  overturn  for  a 
1°*  1°  area  in  the  western  Gulf  of  Mexico  (center  position 
24.0°  N,  94.5°  W).  Vertical  eddy  diffusion  coefficients  were 
imputed  from  MODAS  synthetic  temperature  fields  using  the 
Pacanowski  and  Philander  (1981)  vertical  mixing  scheme. 
Daily  and  vertically  resolved  (total  depth  (z)  =  161  m; 
Az«l  m)  eddy  diffusion  coefficients  were  used  to  solve  for 
the  vertical  turbulent  mixing  of  model  state  variables  using 
a  fully  implicit  method  with  a  time  step  of  1800  s.  The 
coupled  model  was  initialized  using  temperature-nutrient 
relationships  observed  in  the  Gulf  of  Mexico  (Jochens  et  al.. 
2002)  and  then  run  for  ten  simulation  years  to  solve  for 
the  steady  state  solution  for  transformations  of  carbon, 
nitrogen,  and  phosphorus  in  the  upper  ocean.  The  system 
was  forced  to  material  conservation  by  implicit  reminer- 
alization  of  all  particulates  that  sank  below  the  deepest  grid 
cell  (zm  161  m). 

The  coupled  model  results  were  compared  to  local  area 
coverage  SeaWiFS  ocean  color  data  that  were  received  and 
archived  at  the  Naval  Research  Laboratory  (NR L),  Stennis  Space 
Center.  The  satellite  data  were  processed  and  the  intervening 
atmospheric  signal  removed  using  NRL’s  Automated  Processing 
System  (APS).  The  atmospheric  correction  procedures  are  com¬ 
pliant  with  National  Aeronautics  and  Space  Administration 
SeaWiFS  data  processing  protocols.  Three  NRL  APS  products 
derived  from  SeaWiFS  data  were  examined:  (1)  the  surface 
chlorophyll-a  concentration,  which  was  determined  from  the 
OC4v4  band  ratio  algorithm  (0‘Reilly  et  al.,  1998);  (2)  the  surface 
phytoplankton  absorption  coefficient  (443  nm);  and  (3)  the 
surface  colored  detrital  matter  (CDM)  absorption  coefficient 
(412  nm).  The  latter  two  products  were  determined  from  the 
multiband  quasi-analytic  algorithm  (Lee  et  al..  2002),  which 
estimates  total  absorption  coefficients  over  SeaWiFS  visible 
bands  and  then  further  decomposes  them  into  phytoplankton 
and  detrital  contributions.  Each  daily  spatial  mean  of  SeaWiFS 
data  through  4  years  (2001-2004)  from  the  1°  western  Gulf  of 
Mexico  grid  was  used  to  construct  a  satellite  ocean  color  time 


series  wherein  missing  days  due  to  clouds  were  accounted  for 
via  linear  interpolation.  The  time  series  was  lowpass  filtered  to 
remove  variability  from  frequencies  higher  than  10  days;  the 
averages  were  then  computed  to  construct  the  annual 
climatology. 

3.  Results 

The  model  results  are  compared  with  the  daily  climatol¬ 
ogy  calculated  from  4  years  of  SeaWiFS  data  (Fig  1 )  for  three 
surface  bio-optical  fields:  the  surface  chlorophyll-a  concen¬ 
tration,  the  surface  phytoplankton  absorption  coefficient 
(443  nm).  and  the  surface  CDM  absorption  coefficient 
(412  nm).  The  satellite  estimate  of  these  surface  quantities 
will  be  herein  referred  to  as  the  reference  field  and  the 
model’s  simulated  surface  bio-optical  quantities  will  be 
referred  to  as  simply  the  model  field. 

The  Neptune  model’s  three  size-based  phytoplankton 
functional  groups  are  presently  parameterized  so  that  pico- 
phytoplankton  have  a  higher  absorption  efficiency  (per  unit 
chlorophyll-u)  than  larger  phytoplankton,  as  has  been  observed 
in  the  laboratory  and  in  the  field  (e.g,  Bricaud  et  al  2004; 
Millan-Nunez  et  al.,  2004).  Thus  the  model  phytoplankton 
absorption  and  total  chlorophyll  fields  may  vary  with  respect  to 
one  another  due  to  differences  in  the  relative  dominance  of 
simulated  phytoplankton  size  fractions.  In  the  example  given  in 
the  following  section,  the  satellite  estimates  of  phytoplankton 
absorption  and  chlorophyll  are  thus  used  as  a  potential  ob¬ 
servational  constraint  on  the  simulated  competition  between 
phytoplankton  size  fractions. 

3.1  Taylor  diagrams  and  pattern  statistics 

For  the  one-dimensional  case  wherein  the  model’s  surface 
values  are  averaged  over  the  upper  10  m  each  simulated  day 
and  are  compared  with  a  single  daily  reference  value,  the 
model  and  reference  fields  resemble  sinusoidal  functions  of 
time,  or  waveforms  (Fig.  1).  Analogously,  a  measure  of  the 
potential  phase  shift  between  the  two  waveforms  is  also  more 


Fig.  1.  Daily  surface  values  for  the  (A)  chlorophyll-o  concentration  (mg  m"3),  (B)  phytoplankton  absorption  coefficient  (443  nm.  nr'),  and  (C)  CDM  absorption 
coefficient  (412  nm.  m_1)are  indicated  for  the  final  2  years  of  the  model’s  steady  state  solution  (red  line)  and  the  SeaWiFS  climatology  (black  line).  Two  years 
are  shown  in  order  to  emphasize  the  winter  peak  and  bring  further  emphasis  to  temporal  misfits  (i.e..  phase  misfits  quantified  by  linear  correlation 
coefficients). 
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generally  a  common  measure  of  the  agreement  between  two 
fields:  the  linear  correlation  coefficient,  R,  which  is  defined 
by: 


R 


N 

hi  imn-ni)(r„-r) 

n-1 _ 

(Tm(Jr 


(1) 


0.0).  The  model  to  reference  comparison  points  may  then 
be  gauged  by  how  close  they  fall  to  the  reference  point. 
This  distance  is  proportional  to  the  unbiased  Root-Mean- 
Square  Difference  (RMSD'),  as  defined  by: 

RMSD'=(i  £  [(m„-m)-(rn-7)]^  (3) 


The  letter  m  indicates  the  model  field,  r  indicates  the  re¬ 
ference  field,  the  overbar  indicates  the  average,  and  a  is  the 
standard  deviation. 

The  correlation  coefficient  is  bounded  by  the  range 
-1.0<R<1.0.  In  general,  as  the  phase  between  two  temporal 
signals  approaches  agreement,  R  approaches  1.0.  It  is  difficult, 
however,  to  discern  information  about  the  differences  in 
amplitude  between  two  signals  from  R  alone.  For  this  reason, 
another  summary  statistic,  the  normalized  standard  devia 
tion,  may  be  introduced: 


The  normalized  standard  deviation  and  the  correlation 
coefficient  from  each  of  the  three  model  to  reference  field 
comparisons  may  be  displayed  on  a  single  Taylor  diagram 
(Fig.  2).  The  Taylor  diagram  is  a  polar  coordinate  diagram 
that  assigns  the  angular  position  to  the  inverse  cosine  of 
the  correlation  coefficient,  R.  A  correlation  coefficient  of  0 
is  thus  90°  away  from  a  correlation  coefficient  of  1  (see 
scaling  on  Fig.  2).  The  radial  (along-axis)  distance  from  the 
origin  is  assigned  to  the  normalized  standard  deviation,  <j*. 
The  reference  field  point,  which  is  comprised  of  the 
statistics  generated  from  a  redundant  reference  to  refer¬ 
ence  comparison,  is  indicated  for  the  polar  coordinates  ( TO, 


where  the  overbars  indicate  the  mean.  The  term  unbiased 
is  used  herein  to  emphasize  that  Eq.  (3)  removes  any 
information  about  the  potential  bias  (6),  which  is  defined 
as  the  difference  between  the  means  of  the  two  fields: 


B  -=  m  -  r  (4) 

In  other  words,  the  unbiased  RMSD  (RMSD')  is  equal  to  the 
total  RMSD  if  there  is  no  bias  between  the  model  and 
reference  fields.  This  may  be  verified  given  the  quadratic 
relationship  between  the  unbiased  RMSD,  the  bias,  and  the 
total  RMSD: 

RMSD2  =  B2  4  RMSD'2  (5) 


where  the  total  RMSD  is  a  measure  of  the  average  magnitude 
of  difference  and  is  defined  by: 
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In  contrast,  the  unbiased  RMSD  may  be  conceptualized  as 
an  overall  measure  of  the  agreement  between  the  amplitude 
(a)  and  phase  (R)  of  two  temporal  patterns.  For  this  reason, 
the  correlation  coefficient  (R),  normalized  standard  deviation 
((7*),  and  unbiased  RMSD  are  collectively  referred  to  herein  as 
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Fig.  2.  Taylor  diagram  rendering  of  the  mode!  to  reference  field  comparisons  shown  in  Fig.  1  (A)  chlorophylls  concentration  (mg  m'3),  (B)  phytoplankton 
absorption  coefficient  (443  nm.  m"’).  and  (C)  CDM  absorption  coefficient  (412  nm,  m  ’)•  As  explained  in  the  text,  the  radial  distance  is  proportional  to  the 
normalized  standard  deviation  (r  ;*)  and  the  angular  position  corresponds  to  the  linear  correlation  coefficient  (R  values).  In  accordance  with  Eq.  (7),  the  distances 
between  the  labeled  points  and  the  reference  point  are  proportional  to  the  unbiased  RMSD,  Eq.  (3). 
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(A)  Chlorophyll-*/ 


(B)  Phytoplankton  Absorption 


o*  \onnali/cd  Standard  Deviation  (rr  a  ) 

y  m  r 


Fig.  3.  Taylor  diagrams  for  grazing  sensitivity  model  executions  showing  model  to  reference  statistics  for  the  (A)  surface  chlorophyll-a  field  and  (B)  the  surface 
phytoplankton  absorption  field.  The  minimum  total  RMSD  (1)  and  the  minimum  unbiased  RMSD  (2)  are  indicated  on  each  plot.  The  color  scale  is  added  to  both 
Taylor  diagrams  and  corresponds  to  the  minimum  total  RMSD  (black)  to  the  maximum  total  RMSD  (red)  for  each  set  of  model  to  reference  comparison  statistics. 
The  time  series  results  corresponding  to  points  ( I )  and  (2)  in  (B)  are  shown  in  Fig.  4. 
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pattern  statistics.  The  three  pattern  statistics  are  related  to  one 
another  by: 

RMSD'2  <j2  +  <J^-2(Jr  (TmR  (7) 

It  is  this  relationship  that  makes  the  Taylor  diagram  useful: 
the  individual  contribution  of  misfits  in  amplitude  may  be 
compared  to  misfits  in  phase  to  discern  how  they  contribute 
to  the  unbiased  RMSD.  Since  the  diagram  is  in  standard 
deviation  normalized  space,  the  distance  from  the  model 
points  to  the  reference  points  is  also  proportional  to  Eq.  (7), 
which  recast  in  standard  deviation  normalized  units  (indi¬ 
cated  by  the  asterisk)  becomes: 

RMSD*'=  v  i  0  l  <t*2~2(T*R  (8) 

Note  also  that  it  can  be  shown  that  the  minimum  of  this 
function  occurs  where  o*  ~R.  This  is  an  important  relationship 
that  we  will  refer  to  at  several  points  later  in  the  text. 

Fig.  2  shows  that  the  chlorophyll  model  to  reference  field 
comparison  point  (A)  appears  closest  to  the  reference  point, 
whereas  the  phytoplankton  absorption  comparison  point  (B) 
appears  farthest  due  to  a  poorer  correlation  as  well  as  an 
underestimate  of  the  standard  deviation.  Indeed,  the  chlor 
ophyll  comparison  has  the  lowest  normalized  and  unbiased 
RMSD.  However,  the  normalized  bias,  defined  as: 


is  much  larger  for  the  model  chlorophyll  field,  which  con¬ 
sistently  tends  to  overestimate  the  reference  field  (as  shown 
in  Fig  I  A).  Thus  caution  must  be  applied  when  interpreting  a 
Taylor  diagram  wherein  no  information  about  the  bias  is 
included. 


The  importance  of  adding  information  about  the  bias  may 
also  be  further  demonstrated  using  a  large  number  of  model 
executions,  such  as  during  a  sensitivity  analysis.  The  advan¬ 
tage  of  the  Taylor  diagram  in  such  cases  is  that  it  allows  one  to 
discern  how  the  phase  and  amplitude  of  a  simulated  field 
change  as  the  model  is  modified.  The  disadvantage  is  that 
information  about  any  potential  model  to  reference  field  bias 
must  be  somehow  added  to  the  diagram. 

For  example,  the  mortality  rate  for  phytoplankton  (cr)  in 
the  Neptune  ecological  model  is  described  using  the  Ivlev 
(1961 )  formulation; 

(10) 

where  Iv  is  the  Ivlev  parameter  that  describes  how  the  maxi¬ 
mum  potential  mortality  rate  (cm)  is  attenuated  with  decreasing 
phytoplankton  biomass  (C).  With  three  phytoplankton  func¬ 
tional  groups  and  an  estimated  Iv  parameter  space  incremented 
for  6  values,  there  are  216  potential  grazing  permutations. 

The  results  of  216  separate  model  executions  are  shown  on 
two  Taylor  diagrams  (Fig.  3).  For  brevity,  only  the  first  two  field 
comparisons,  phytoplankton  chlorophyll  and  phytoplankton 
absorption,  are  shown  since  the  CDM  absorption  field  is 
somewhat  less  sensitive  to  the  grazing  parameter  selections.  It 
is  important  to  note  that  the  model  and  reference  fields  were 
not  log-transformed.  In  this  case,  it  would  not  make  a  con¬ 
siderable  difference;  however,  if  there  were  large  outliers  in 
either  field  then  log-transformation  may  significantly  impact 
the  value  of  statistical  quantities.  Some  investigators  may 
choose  to  log-transform  the  fields  first,  particularly  if  the  bio- 
optical  fields  range  over  several  orders  of  magnitude.  If  the 
fields  are  log-transformed  then  the  investigator  should  be 
cognizant  that  statistical  quantities  generated  from  non  log- 
transformed  values  may  be  different. 


Fig.  4.  The  reference  field  phytoplankton  absorption  (dashed  line)  is  compared  to  the  minimum  total  RMSD  (1  —  solid  black  line)  and  the  minimum  unbiased  RMSD 
(2  —  red  line);  these  time  senes  correspond  to  points  ( 1 )  and  (2)  in  Fig.  3B.  As  in  Fig.  1.  two  years  are  shown  to  emphasize  the  winter  peak  and  draw  emphasis  to 
phase  misfits  quantified  by  the  linear  correlation  coefficients. 
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In  both  Taylor  diagrams  presented  here,  the  model  points 
that  come  closest  to  the  reference  point  have  the  smallest 
unbiased  RMSD  value  (Fig.  3).  It  would  appear  that  the  cluster 
of  model  points  closest  to  the  reference  point  may  thus 
provide  the  closest  fit  to  the  data.  Here,  however,  the 
inclusion  of  a  relative  total  RMSD  color  scale,  which  indicates 
the  range  of  minimum  to  maximum  total  RMSD  using  a 
spectral  (rainbow)  color  scaling  increment  (Fig.  3),  reveals 
that  some  points  nearest  the  reference  point  may  have  larger 
total  RMSD  values.  This  is  particularly  the  case  for  phyto¬ 
plankton  absorption  (Fig.  3B)  where  the  cluster  of  points 
closest  to  the  reference  point  also  have  the  largest  total  RMSD 
values.  For  the  phytoplankton  absorption  field,  improvement 
in  the  correlation  coefficient  appears  to  come  at  the  expense 
of  an  increase  in  the  bias,  and  consequently,  the  total  RMSD. 
The  minimum  total  RMSD  (point  1)  and  minimum  unbiased 
RMSD  (point  2)  from  the  phytoplankton  absorption  compar¬ 
isons  arc  also  shown  as  a  time  series  plot  (Fig.  4).  Clearly,  the 
red  line  (minimum  unbiased  RMSD)  has  a  better  phase  agree¬ 
ment  but  overestimates  the  observed  values. 

In  coupled  hydrodynamic-ecosystem  modeling  applica¬ 
tions,  information  about  the  bias  and  the  total  RMSD  may  be 
just  as  important  to  the  investigatoras  information  about  the 
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H  -  m  r 
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pattern  statistics,  particularly  when  evaluating  the  sensitivity 
of  a  model  to  parameter  selection  for  the  purpose  of  mini¬ 
mizing  the  magnitude  of  the  misfit  between  the  model  and 
reference  fields.  Taylor  (2001)  suggested  adding  lines  of 
various  lengths  corresponding  to  the  total  RMSD  in  propor¬ 
tion  to  the  unbiased  RMSD  onto  the  Taylor  diagram;  however, 
this  procedure  may  result  in  a  confusing  diagram  when  large 
numbers  of  model  runs  are  compared.  A  color  scale  modi¬ 
fication  of  theTaylor  diagram,  as  shown  here  (Fig.  3),  may  also 
be  useful  but  the  overall  import  of  the  Taylor  diagram  may 
nonetheless  be  easily  misinterpreted. 

3.2.  Target  diagrams 

An  alternative  to  the  Taylor  diagram  is  to  formulate  a 
target  diagram  that  provides  summary  information  about  the 
pattern  statistics  as  well  as  the  bias  thus  yielding  a  broader 
overview  of  their  respective  contributions  to  the  total  RMSD. 
The  relationship  between  the  bias,  unbiased  RMSD,  and  the 
total  RMSD  (Eq.  (5))  provides  a  convenient  starting  point  to 
construct  such  a  diagram.  In  a  simple  Cartesian  coordinate 
system,  the  unbiased  RMSD  may  serve  as  the  X-axis  and  the 
bias  may  serve  as  the  Y-axis.  The  distance  between  the  origin 
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Fig.  5.  Target  diagram  for  model  chlorophyll-u  and  reference  chlorophyll -a  comparisons  The  Y-axis  corresponds  lo  the  bias,  the  X-axis  corresponds  lo  lhe  unbiased 
RMSD  multiplied  by  the  sign  of  the  model  and  reference  standard  deviation  difference  (urf).  and  the  distance  from  each  point  to  the  origin  is  proportional  to  the 
total  RMSD  The  minimum  total  RMSD  ( 1)  and  rhe  minimum  unbiased  RMSD  (2)  are  indicated  on  the  plol.  The  color  scaling  is  the  same  as  in  Fig.  3. 
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and  the  model  versus  observation  statistics  (any  point,  s, 
within  the  X,Y Cartesian  space)  is  then  equal  to  the  total  RMSD 

(Fig.  5). 

By  definition,  the  X-axis  (unbiased  RMSD)  must  always 
be  positive.  However,  the  X<0.0  region  of  the  Cartesian 
coordinate  space  may  be  utilized  if  the  unbiased  RMSD  is 
multiplied  by  the  sign  of  the  standard  deviation  difference 

ad  signing -<Jr)  (11) 

The  resulting  target  diagram  thus  provides  information 
about  whether  the  model  standard  deviation  is  larger 
(X  >0)  or  smaller  (X<0)  than  the  reference  field’s  standard 
deviation,  in  addition  to  a  positive  (Y>0)  or  negative  bias 
(7<0)  (Fig.  5).  The  units  of  this  diagram  are  all  in  chlo¬ 
rophyll  concentration  (mg  rrf 3),  but  this  may  again  be 
addressed  by  normalizing  the  quantities  by  the  reference 


field  standard  deviation  (Fig.  6),  such  that  the  distance  of 
each  point  from  the  origin  is  the  standard  deviation  nor¬ 
malized  total  RMSD: 

RMSD*2  =  B*2  f  RMSD*'2  (12) 

Rendering  the  diagram  in  normalized  units  allows  one  to 
better  compare  the  model’s  chlorophyll  performance  with 
other  potential  areas  of  performance  such  as  CDM  absorp¬ 
tion  and  phytoplankton  absorption. 

Furthermore,  markers  within  the  diagram  may  be  added  to 
provide  an  additional  basis  for  interpreting  model  perfor¬ 
mance.  For  example,  the  investigator  may  wish  to  gauge  how 
the  model’s  total  RMSD  compares  to  the  time  series  mean.  In 
other  words,  if  the  first  guess  is  the  time  series  average,  does 
the  model  provide  an  overall  improvement  over  the  first  guess 
with  respect  to  the  minimization  of  the  average  misfit  bet¬ 
ween  the  model  and  reference  fields? 


B*  = : 


(m-r) 


r~  a 


(  olor  Scale 


L  J, 


Fig.  6.  Normalized  target  diagram  for  model  chlorophyll  o  and  reference  chlorophyll-<J  comparisons.  The  axes  are  the  same  as  in  Fig.  4.  only  they  are  normalized  by 
the  reference  field  standard  deviation  (indicated  by  *).  The  thick  line  (M0)  corresponds  to  a  normalized  total  RMSD  of  TO.  the  thin  line  (Mn7)  corresponds  to 
RMSD* -0.71.  The  significance  of  these  markers  is  explained  in  the  text.  The  dashed  line  represents  the  threshold  of  observational  uncertainty  (OU).  The  minimum 
total  RMSD  ( 1 )  and  the  minimum  unbiased  RMSD  (2)  are  indicated  on  the  plot.  The  color  scaling  is  the  same  as  in  Figs.  3  and  5. 
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The  total  RMSD  between  the  reference  field  and  the 
reference  field  mean  is  simply  the  reference  field’s  standard 
deviation.  Since  the  diagram  is  in  standard  deviation  normal¬ 
ized  units,  a  normalized  total  RMSD  value  of  10  provides  a 
convenient  performance  marker  (marker  M0,  Fig.  6).  If  the 
investigator  is  concerned  with  the  total  RMSD,  and  not  merely 
the  pattern  statistics,  then  any  points  greater  than  RMSD*- 1 
may  be  considered  poor  performers  since  they  offer  no  im¬ 
provement  over  the  time  series  average. 

It  is  also  interesting  to  note  that  the  normalized  total  RMSD 
(RMSD*)  is  related  to  the  modeling  efficiency  (MEF)  metric 
presented  in  Stow  et  al.  (2009)  via  the  relationship:  MEF-1 
-RMSD*2.  The  MEF  may  be  used  to  discern  how  well  a  model 
performs  as  a  predictor  of  the  data  compared  to  the  mean  of  the 
data  (Stow  et  al..  2003;  Nash  and  Sutcliffe.  1970).  This  under¬ 
scores  the  significance  of  the  RMSD*  - 1  (Mo)  marker  within  the 
normalized  target  diagram  since  points  between  it  and  the 
origin  also  have  a  better  than  average  MEF  score. 

A  weakness  of  the  target  diagram  is  that  it  does  not  provide 
explicit  information  about  the  correlation  coefficient.  However, 
there  are  certain  limits  inherent  in  the  statistics  summarized  by 
the  diagram  that  one  may  use  to  make  some  inference  about 
the  correlation  coefficient.  For  example,  recall  the  relationship 
between  the  correlation  coefficient,  the  normalized  standard 
deviation,  and  the  normalized  and  unbiased  RMSD(Eq.  (8)).  It 
can  be  shown  that  for  values  of  R  (where  -1.0<R^0.0)  the 
minimum  value  of  RMSD*'  for  all  potential  values  of  <7*  (where 
0.0<rJ*^«>)  approaches  1.0.  Thus  no  model/reference  compar¬ 
ison  points  that  appear  on  the  target  diagram  within  the  range  of 
- 1.0<  X<  1.0  can  be  negatively  correlated.  Since  the  square  of  the 
normalized  bias  must  always  be  positive,  then  by  extension  all 
points  where  RMSD*<1.0  must  also  be  positively  correlated.  In 
other  words,  the  first  marker  at  RMSD*  =  1.0  (marker  Mo,  Fig.  5) 
also  establishes  that  all  points  between  it  and  the  origin  are 
positively  correlated.  Positively  correlated  results  may  appear 
outside  this  marker;  however,  these  points  will  have  a  large 
magnitude  of  difference  from  the  observations  due  to  either  a 
significant  bias,  a  difference  in  variance,  or  some  combination 
thereof.  This  relationship  may  be  formally  expressed  as  follows: 

for  Vsc{RMSD  *  |RMSD  *  <1.0}~-*tf0.0  (13) 

where  s  is  a  notation  for  any  point  on  the  target  diagram.  Similar 
such  markers  based  upon  the  correlation  coefficient  may  be 
established  closer  to  the  origin  for  values  of  R  where  R> 0.0.  In 
accordance  with  Eq.  (8),  the  minimum  value  of  RMSD*'  occurs 
for  any  positive  value  of  R  where  t7*-R.  Thus  if  one  wants  to 
determine  the  minimum  unbiased  RMSD  value  possible  (MW1) 
given  a  specific  correlation  value,  R 1,  then  the  solution  may  be 
expressed  as: 


Mm  min|  RMSD.')  v10  +  R12-2R12  (14) 

Since  the  minimum  total  RMSD  must  also  occur  where  the 
bias  is  equal  to  0.0,  MR]  is  also  the  minimum  total  RMSD 
value  for  a  given  correlation  coefficient  value.  J?1.  For  the 
general  case  where  R 1  >0.0: 

for  Vso { RMSD  -  |RMSD*  <MR1 }  — R>K1  (15) 

For  example,  Fig.  6  shows  the  second  marker  towards  the 
origin  for  R1  =0.7.  Thus  all  points  between  this  marker  (M07) 


and  the  origin  are  indicative  of  a  correlation  coefficient 
greater  than  0.7. 

The  color  scale  in  Fig.  6  is  redundant:  both  the  distance 
from  the  origin  and  the  color  index  are  proportional  to  the 
total  RMSD.  The  color  variable  is  thus  left  as  a  free  variable 
that  may  be  used  to  also  explicitly  indicate  the  correlation 
coefficient,  or  it  may  be  used  to  indicate  any  supplemental 
information  regarding  the  simulations  that  are  displayed  in 
the  diagram  (Friedrichs  et  al.,  2009).  In  our  example,  the 
sensitivity  analysis  is  focused  upon  the  grazing  parameters. 
We  may  define  an  aggregate  index  of  phytoplankton  grazing 
stress  (Al)  as  the  sum  of  the  three  Ivlev  parameters  and 
display  this  index  using  the  color  scale,  as  in  Fig.  7.  Clearly,  the 
Al  most  appreciably  impacts  the  bias:  as  aggregate  grazing 
stress  increases  the  simulations  consistently  underestimate 
the  satellite-based  observations  of  surface  chlorophyll. 
Furthermore,  the  lowest  aggregate  grazing  stress  corresponds 
to  the  highest  bias  (point  2,  Fig.  7). 

Diagrams  that  summarize  repeated  comparisons  of  model 
results  and  data  should  also  make  some  indication  of  un¬ 
certainties  that  exist  within  the  data.  One  may  define  data  as 
truth  plus  some  unknown  observational  uncertainty.  The  ad¬ 
vantage  of  using  a  satellite  climatology  based  upon  a  large 
number  of  spatial  means,  as  in  this  case,  is  that  one  may 
choose  to  assume  that  the  ensemble  average  observational 
uncertainty  approaches  zero  as  the  total  number  of  observa¬ 
tions  becomes  very  large  ( — n>1000).  One  approach  might 
be  to  state  that  assumption  and  forego  any  further  indication 
of  observational  uncertainty.  A  note  of  caution  must  also  be 
applied  insofar  as  this  approach  assumes  that  the  observa  - 
tional  uncertainty  is  also  unbiased. 

Nevertheless,  for  the  more  general  case  there  exists  a  large 
sum  of  potential  observational  uncertainties  arising,  in  part, 
from  measurement  error.  For  satellite  data,  these  errors  may 
arise  from  imperfections  in  the  satellite  sensor,  errors  in  the 
algorithms  applied,  atmospheric  correction  errors,  and  nume¬ 
rous  other  areas  beyond  the  scope  of  this  paper.  It  is  therefore 
reasonable  to  assume  that  there  must  be  some  average  mini¬ 
mum  threshold  value  for  the  total  RMSD  below  which  further 
improvement  in  model/data  agreement  may  not  be  signifi 
cant.  The  dashed  line  in  Fig.  5  is  an  estimate  of  this  observa¬ 
tional  uncertainty  (OU)  threshold.  Points  that  fall  between 
this  limit  and  the  origin  are  all  within  the  range  of  estimated 
observational  uncertainty. 

To  be  sure,  observational  uncertainty  is  a  potentially  com¬ 
plicated  and  contentious  subject.  Our  objective  here  is  to  simply 
represent  some  estimate  of  this  uncertainty  on  the  target 
diagram  so  as  to  indicate  where  further  efforts  towards  im¬ 
proved  model  to  data  agreement  may  not  be  a  prudent  use  of 
time  and  resources.  While  it  is  entirely  reasonable  and  appro¬ 
priate  to  assume  that  observational  uncertainty  does  provide  an 
upper-limit  upon  potential  improvements  in  model  perfor¬ 
mance,  our  tentative  estimates  of  this  average  uncertainty 
should  be  regarded  as  preliminary  and  much  more  work  in  this 
area  needs  to  be  done. 

In  this  case,  an  average  observational  uncertainty  was 
assumed  for  the  satellite  time  series  based  on  literature  values 
for  chlorophyll  algorithm  accuracy  in  optically  deep  waters 
(Bailey  and  Werdell,  2006;  McClain  et  al.,  2006)  without  any 
further  consideration  of  the  uncertainty  within  the  measure¬ 
ments  to  which  the  satellite  data  are  compared  If  the  average 
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observational  uncertainty  (or)  is  expressed  as  a  percent,  then  a  r 
may  be  used  as  an  estimate  for  the  average  value  of  uncertainty 
for  the  time  series.  For  example,  a  a  value  of  ±15%  and  an 
average  chlorophyll-a  observation  of  0.2  mg  m“ 3  would  yield  an 
average  uncertainty  of  ±0.03  mg  nT 3.  A  model  to  reference  field 
total  RMSD  of  <0.03  mgnT  3  is  within  the  average  observational 
uncertainty  threshold  and  further  improvement  (model  to  data 
misfit  reduction)  may  not  be  meaningful. 

This  assumed  OU  limit  may  be  placed  on  the  target  dia¬ 
gram  by  normalizing  a7  by  the  reference  field  standard 
deviation  (dashed  line,  Fig.  7).  The  normalization  procedure 
effectively  means  that  the  assumption  of  average  observa¬ 
tional  uncertainty  (a)  is  divided  by  the  coefficient  of  variation, 
which  is  the  reference  field  standard  deviation  divided  by  the 
reference  field  mean.  The  coefficient  of  variation  is  a  common 
measure  of  the  dispersion  within  a  distribution.  It  is  beyond 
the  scope  of  this  paper  to  further  examine  how  the  dispersion, 
in  turn,  may  be  impacted  by  the  observational  uncertainty, 
but  we  recognize  that  they  are  not  necessarily  independent 


In  summary,  the  target  diagram  displays  the  model  to 
reference  field  bias  (T-axis)  and  the  model  to  reference  field 
unbiased  RMSD  (X-axis).  The  distance  between  any  point,  s, 
and  the  origin  is  then  the  value  of  the  total  RMSD.  All  of  the 
quantities  may  be  normalized  by  the  reference  field  standard 
deviation  to  remove  the  units  of  measurement.  The  outermost 
marker  (M0“  RMSD*  - 1.0)  establishes  that  all  points  between 
it  and  the  origin  represent  positively  correlated  model  and 
reference  fields,  and  also  have  a  better  than  average  MEF  score. 
A  second  marker  may  be  added  to  indicate  another  positive  R 
value,  such  as  R« 0.7,  for  which  all  points  between  it  and  the 
origin  are  greater  than  R.  Finally,  a  dashed  line  indicates  the 
estimate  of  average  observational  uncertainty  and  further 
model  to  data  agreement  for  points  between  this  marker  and 
the  origin  may  not  be  meaningful. 

The  target  diagram  was  also  constructed  for  the  phyto¬ 
plankton  absorption  field  (Fig.  8).  In  order  to  display  the  entire 
set  of  model  versus  reference  comparisons  for  phytoplankton 
absorption,  the  scale  for  the  target  diagram  (Fig.  8)  had  to  be 


Fig.  7.  Normalized  target  diagram  for  model  chlorophyll-o  and  reference  chloraphyll-a  comparisons.  The  axes  and  the  markers  are  the  same  as  in  Fig  6.  The  color 
scaling  has  been  changed  to  indicate  the  aggregate  index  (Al)  for  grazing  stress,  as  explained  in  the  text 
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Fig.  8.  Normalized  Target  diagram  for  model/ieference  phytoplankton  absorption  fields.  The  axes  are  normalized  by  the  reference  field  standard  deviation 
(indicated  by  *}.  The  thick  line  (M0)  corresponds  to  a  normalized  total  RMSD  of  TO,  the  thin  line  (M0.7)  corresponds  to  RMSD*-0.71.  The  significance  of  these 
markers  is  explained  in  the  text.  The  dashed  line  represents  the  threshold  of  observational  uncertainty  (OU).  The  minimum  total  RMSD  (1)  and  the  minimum 
unbiased  RMSD  (2)  are  indicated  on  the  plot. 


expanded  to  encompass  RMSD* -2.  Note  that  the  simulations 
with  the  best  pattern  statistics  (Fig.  3B)  also  have  a  very  large 
positive  bias  (red  cluster,  Fig  8).  In  this  particular  case,  the 
target  diagram  better  delineates  poor  performing  model  exe¬ 
cutions  than  the  Taylor  diagram  since  the  model  is  prone  to  a 
large  bias  for  this  field. 

3.3  The  skill  target  diagram 

Additional  alternatives  to  the  Taylor  diagram  for  summar¬ 
izing  pattern  statistics  as  a  measure  of  model  skill  may  be 
preferable  since  there  is  a  subtle  discrepancy  between  im¬ 
proving  the  unbiased  RMSD  and  improving  the  individual 
correlation  coefficient  and  standard  deviation  statistics,  and 
there  may  be  circumstances  where  this  consideration  is  un 
portant.  For  example,  consider  that  there  may  be  fundamental 
limits  to  the  expected  agreement  between  a  model  and  a 


reference  field.  Even  if  all  model  inaccuracies  and  observa¬ 
tional  uncertainties  could  be  eliminated,  there  may  yet  remain 
unforced  oscillations  that  prevent  exact  model/reference  field 
agreement.  Suppose  that  an  estimate  of  this  uncertainty  yields 
a  maximum  potentially  attainable  correlation  coefficient  of 
0.65.  As  stated  in  Section  3.1,  the  minimum  value  of  the  un¬ 
biased  RMSD  occurs  where  (J* -R  for  positive  values  of  R. 

This  relationship  may  be  demonstrated  on  a  Taylor  diagram 
(Fig.  9).  For  R=0.65  the  minimum  RMSD*'  value  occurs  where 
o*  -0.65.  The  three  sets  of  partem  statistics  correspond  to  the 
waveforms  in  Fig.  9B.  The  minimum  average  difference  is  the 
smallest  amplitude  pattern,  but  if  amplitude  and  phase  are 
weighed  equally,  as  in  a  potential  alternative  measures  of  model 
skill,  then  the  waveform  where  1  may  be  the  most  skillful. 

This  example  demonstrates  the  implicit  contradiction 
between  minimizing  the  RMSD  and  improving  (J*  towards 
an  ideal  value  of  1.0.  If  the  goal  is  to  improve  the  total  RMSD 
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Fig.  9.  (A)  A  Taylor  diagram  is  shown  for  three  model  to  reference  field  comparisons  where  R- 0  65  and  {1 )  <j*«0.65,  (2)  1.0.  and  (3)<j*-  1.35  An  example  of 

ltiree  sinusoidal  waveforms  and  a  reference  field  corresponding  to  the  slatistics  in  (A)  is  shown  in  panel  (B). 


then  <r*  values  <  1.0  are  preferable.  Clearly,  if  the  two  signals 
are  out  of  phase,  then  reduction  in  the  model  variance  to  a 
threshold  value  diminishes  the  total  RMSD  value.  However, 
if  the  goal  of  the  investigation  is  to  independently  move  R 
and  cr*  as  close  to  an  ideal  value  of  1.0  as  is  possible  then  it 
may  be  inappropriate  to  use  the  total  or  unbiased  RMSD  as  a 
model  validation  metric. 

This  is  an  important  point  since  many  model  and  obser¬ 
vation  comparison  exercises  may  involve  RMSD-based 


metrics.  For  example,  Wallhead  et  al.  (in  press)  use  the  term 
“skillful"  to  refer  to  model  predictions  that  minimize  mean- 
square  differences.  Sheng  and  Kim  (2009)  use  RMSD 
metrics  and  Taylor  diagrams  as  part  of  their  water  quality 
model  evaluation  scheme.  Smith  et  al.  (2009)  use  an  RMSD- 
based  cost  function  as  part  of  a  data  assimilation  scheme. 
Indeed,  RMSD-based  metrics  of  model  performance  are  likely 
to  continue  to  be  used  in  a  wide  variety  of  contexts  and 
investigators  should  at  least  be  cognizant  of  how  RMSD-based 
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functions  or  skill  scores  quantify  mismatches  in  variance 
when  correlation  coefficients  are  less  than  unity. 

Alternative  metrics  of  model  skill  (skill  scores)  have  been 
proposed  (Murphy  and  Epstein.  1989;  Taylor.  2001),  such  as: 


SI  1.0- 


2(1  t  R) 


((T  *  4  1  M*) 


(16) 


and 


S2  -  1.0- 


f  1  ♦  Rf 


((T  *  i  1/m*)  4 


(17) 


The  prevailing  convention  is  to  have  the  skill  score  range 
between  0.0  (for  poor  skill)  and  1.0  (for  superior  skill).  This 
convention  is  reversed  here  since  our  objective  is  to  build  a 
summary  skill  target  diagram  similar  to  the  one  developed  in 
Section  3.2. 

An  important  feature  to  consider  is  how  these  potential 
skill  scores  proportionally  penalize  underestimates  or  over¬ 
estimates  of  the  standard  deviation.  For  example,  given  a 
constant  R  value  of  0.7.  the  normalized  and  unbiased  RMSD, 
SI,  and  S2  are  shown  for  0.0<<r*<2.0  in  Fig.  10.  Minimum 
skill  scores  occur  where  M*- 1,  consistent  with  our  stated  skill 
score  convention.  However,  SI  and  S2  appear  to  penalize 
underestimates  of  the  variance  more  than  proportional  over¬ 
estimates,  and  are  thus  opposite  of  the  RMSD*'  statistic  that 
rewards  variance  underestimates.  A  potential  alternative  to 
these  measures  is  a  Gaussian  function  that  penalizes  propo 
rtional  overestimates  and  underestimates  of  <j*  equally  over 
the  range  [0.  2].  Multiplication  by  a  scaled  correlation  score 
may  then  constitute  a  measure  of  model  skill: 


S3  1 


(18) 


This  measure  of  skill  may  now  be  incorporated  into  a 
diagram  similar  to  the  one  developed  in  the  previous  section. 
Here,  however,  the  emphasis  is  on  the  comparison  of  one 


model  to  another  more  than  the  misfit  between  the  model 
and  the  data.  Accordingly,  a  relative  measure  of  bias  may  be 
given  as: 

|Max{eB'123  „}!  (,9) 

that  is,  the  maximum  normalized  bias  of  the  ith  model  exe¬ 
cution  is  its  bias  divided  by  the  maximum  magnitude  bias 
from  the  total  set  of  n  model  to  data  comparisons. 

If  Bm  serves  as  the  V  axis  and  S3  times  the  sign  of  the 
standard  deviation  difference  (Md)  serves  as  the  X-axis,  then 
the  resulting  skill  target  diagram  renders  distances  from  the 
origin  that  are  proportional  to: 

ST-  v  Bi+  S32  (20) 

The  contrast  between  the  ST  score  and  the  total  RMSD  is  that 
the  skill  score  does  not  reward  underestimates  of  the  variance 
for  correlation  values  less  than  one.  Markers  for  the  skill  target 
diagram  are  based  on  the  percentile  ST  score  of  the  models.  For 
example,  in  this  case  the  mean  ST  score  (ST)  is  0.51  and  the 
standard  deviation  (mst)  is  0.28,  thus  the  90th  percentile 
(assuming  a  normal  score  probability  density  function  and 
recalling  our  skill  convention  rewards  low  scores  instead  of 
high  scores)  corresponds  to  ST- 1.28  (7<^or  ST=0.15.  A  similar 
marker  for  the  50th  percentile  (ST- ST)  is  shown  on  Fig.  11.  In 
this  case,  the  most  skillful  simulation  (point  2,  Fig.  11)  is  yet 
again  different  from  the  minimum  total  RMSD  simulation 
(point  1,  Fig.  11 ). 

The  discrepancy  between  minimum  skill  and  RMSD  scores  is 
exaggerated  for  the  phytoplankton  absorption  field  (Fig.  12). 
The  minimum  unbiased  RMSD  score,  as  would  appear  to  be  the 
best  fit  in  a  Taylor  diagram,  is  also  indicated  (point  3,  Fig.  12). 
These  three  model  fields  are  presented  against  the  reference 
field  in  Fig.  13.  Evidently,  the  minimum  unbiased  RMSD  model 
field  is  unacceptable  due  to  the  large  positive  bias.  In  contrast, 
the  minimum  RMSD  (point  1,  Fig.  12)  and  superior  skill  model 
fields  (point  2;  Fig.  12)  are  less  biased  but  are  out  of  phase  with 
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Fig.  11.  Skill  target  diagram  for  model  to  reference  chlorophyll  s  field  comparisons  The  minimum  total  RMSD  ( 1 ),  minimum  skill  score  (2).  and  minimum  unbiased 
RMSD  (3)  are  indicated  on  the  plot.  The  markers  indicate  the  50th  and  90th  percentile  total  skill  scores  (ST)  for  the  total  set  of  model  to  reference  comparisons,  as 
explained  in  the  text.  The  X  axis  is  the  S3  skill  score  multiplied  by  the  sign  of  the  standard  deviation  difference  The  T  axis  is  the  maximum  normalized  bias.  The 
color  scale  indicates  the  total  RMSD  values. 


the  reference  field  by  several  months  (Fig.  13),  All  three  results 
provide  information  potentially  useful  to  the  investigator;  other 
parameters  may  potentially  be  adjusted  to  either  reduce  the 
phase  error  for  fields  (1 )  and  (2),  or  the  bias  may  be  reduced  in 
(3),  which  is  better  correlated  with  the  reference  field.  The 
salient  point  to  be  made  here,  however,  is  that  for  multiple 
model  executions  the  skill  target  diagram  may  identify  poten¬ 
tial  contrasts  between  minimum  RMSD  and  other  measures  of 
model  skill. 

4.  Discussion 

An  important  point  mentioned  elsewhere  in  this  special 
volume  (Stow  et  al.,  2009)  is  worthy  of  reiteration  here: 
different  statistical  quantities  (i.e„  skill  metrics)  may  capture 
different  aspects  of  model  performance,  and  a  thorough 
assessment  of  model  skill  may  require  use  of  multiple  types  of 
skill  metrics  simultaneously.  Accordingly,  it  is  important  to 
recognize  the  relationships  that  exist  between  various 
statistical  quantities  and  how  they  represent  related  hut 
differentiable  aspects  of  model  performance.  Linear  cor¬ 


relation  coefficients  and  variance  comparisons  help  to  iden¬ 
tify  similarities  of  pattern,  and  they  may  be  combined  in  a  way 
that  is  equivalent  to  the  unbiased  RMSD  score  (Eq.  (7)),  which 
succinctly  quantifies  pattern  agreement.  In  our  example  of  a 
one-dimensional  time  series,  we  related  these  aspects  of 
model  performance  to  the  similarity  of  phase  and  amplitude 
between  two  time-dependent  and  sinusoidal  like  patterns, 
but  this  concept  may  be  generalized  to  describe  the  shape 
(such  as  the  pattern  of  potential  contour  lines)  of  multidimen¬ 
sional  property  fields. 

Pattern  agreement  is  an  important  aspect  of  model  per¬ 
formance,  and  there  may  be  instances  where  this  aspect  is  of 
particular  or  exclusive  concern  to  the  investigator.  For  exam¬ 
ple,  Li  et  al  (2007)  use  Taylor  diagrams  to  compare  modeled 
and  observed  distributions  of  soil  moisture  and  precipitation. 
Since  the  average  values  from  the  simulations  were  adjusted 
to  agree  with  observed  averages,  the  pattern  information  was 
the  primary  aspect  of  interest  from  their  climate  model’s 
performance.  In  such  cases,  Taylor  diagrams  are  useful  skill 
assessment  tools  insofar  as  they  provide  summary  informa 
tion  about  how  the  linear  correlation  coefficient  and  the 
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Fig.  12.  Skill  target  diagram  for  model  to  reference  phytoplankton  absorption  field  comparisons.  The  minimum  total  RMSD  (1).  minimum  skill  score  (2).  and 
minimum  unbiased  RMSD  (3)  are  indicated  on  the  plot  The  markers  indicate  the  50th  and  90th  percentile  total  skill  scores  (ST)  for  the  total  set  of  model  to 
reference  comparisons,  as  explained  in  the  text.  The  X-axis  is  the  S3  skill  score  multiplied  by  the  sign  of  the  standard  deviation  difference.  The  T-axis  is  the 
maximum  normalized  bias.  The  color  scale  indicates  the  total  RMSD  values. 


variance  comparisons  each  contribute  to  the  unbiased  RMSD 
on  a  two  dimensional  diagram.  Indeed,  the  pattern  informa¬ 
tion  may  often  be  the  primary  area  of  interest  for  many 
climate  model  studies. 

Nevertheless,  in  cases  where  the  magnitude  of  the  model 
results  are  not  adjusted  a  posteriori,  the  usefulness  of  the  Taylor 
diagram  (and  the  statistical  quantities  it  summarizes)  as  a  skill 
assessment  tool  may  be  incomplete  since  it  often  provides  no 
information  about  other  aspects  of  model  performance  such  as 
the  bias  (the  comparison  of  mean  values)  or  the  total  RMSD  (a 
metric  for  overall  model  and  data  agreement).  One  way  to 
remedy  this  omission  is  to  modify  Taylor  diagrams  via  the  ad¬ 
dition  of  a  color  dimension  indicating  the  magnitude  of  either 
the  bias  or  the  total  RMSD.  An  example  of  this  style  of  modi¬ 
fication  is  given  here  and  has  been  previously  shown  elsewhere 
(Orr.  2002). 

More  generally,  however,  information  about  the  bias 
introduces  the  aspect  of  scale  or  magnitude  to  the  model 
skill  assessment  process.  For  example,  two  surface  chlor¬ 
ophyll  fields  may  have  a  perfect  correlation  score  and 
identical  variances  but  the  model  field  may  still  be  an 


order  of  magnitude  larger  than  the  observations.  This  would 
suggest  that  too  much  nitrogen  or  carbon,  for  example, 
resides  within  the  phytoplankton  compartment  and  the 
ecosystem  model  may  be  inappropriately  parameterized  or 
structurally  inadequate.  In  many  ocean  ecoystem  (or 
biogeochemical)  model  applications,  the  time-dependent 
flux  of  materials  from  one  reservoir  to  another  may  be 
constrained  by  the  magnitude  of  the  observations,  rather 
than  merely  the  pattern  information.  This  is  particularly 
pertinent  to  the  biological  aspects  of  coupled  models 
because  the  overall  magnitude  of  biological  productivity  is 
a  critical  aspect  of  ecosystem  function.  Furthermore,  while 
the  unbiased  RMSD  may  effectively  quantify  pattern  agree¬ 
ment,  it  is  seldom  used  as  a  metric  for  overall  model  and  data 
agreement,  whereas  the  total  RMSD  is  more  frequently 
applied  to  this  task. 

For  these  reasons,  we  have  developed  the  target  diagram,  a 
Cartesian  coordinate  plot  that  provides  summary  information 
about  how  the  magnitude  and  sign  of  the  bias  and  the  pattern 
agreement  (unbiased  RMSD)  each  contribute  to  the  total 
RMSD  magnitude.  Markers  may  be  added  to  the  diagram  in 
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Fig- 13.  The  model  and  reference  fields  are  plotted  for  the  results  indicated  in  Fig  12:  the  minimum  total  RMSD  (1),  minimum  skill  score  (2).  and  minimum  unbiased 
RMSD  (3.  red). 


order  to:  (1)  help  identify  limits  based  upon  the  correlation 
coefficient;  (2)  provide  an  assessment  of  model  performance 
compared  to  an  observational  average  (marker  Mo)  ;  and  (3) 
indicate  potential  limits  to  model  performance  improvement 
when  the  average  observational  uncertainty  has  been  esti¬ 
mated.  The  observational  uncertainty  marker  creates  a  ‘'bull's- 
eye"  for  the  target  diagram  that  may  very  effectively  com¬ 
municate  the  estimated  limits  of  model  performance  to  other 
investigators. 

For  example,  in  our  sensitivity  analysis  of  grazing  para¬ 
meter  selection.  216  model  fields  may  be  compared  to  three 
reference  field  categories  for  a  total  of  648  sets  of  model  to 
reference  field  statistics.  These  may  all  be  summarized  on  a 
single  target  diagram  (Fig.  14).  Cursory  inspection  of  this 
summary  diagram  reveals  that  phytoplankton  absorption  is 
the  most  sensitive  field  and  CDM  absorption  is  the  least.  The 
phytoplankton  absorption  field  is  also  prone  to  a  large  posi¬ 
tive  bias.  The  chlorophyll  field  appears  to  achieve  the  mini¬ 
mum  magnitude  for  total  difference  statistics,  but  further 
improvement  would  be  within  the  estimated  range  of  average 
observational  uncertainty. 

To  be  sure,  the  purpose  of  both  the  Taylor  and  target  dia¬ 
grams  is  to  compactly  summarize  statistical  quantities  that 
serve  to  aid  in  the  skill  assessment  of  model  performance.  The 
utility  of  either  approach  is  dependent  upon  the  aspects  of 
model  performance  the  metrics  they  summarize  adequately 
capture.  For  the  specific  application  to  ocean  ecosystem  model¬ 
ing,  we  suggest  that  target  diagrams  may  better  summarize  the 
overall  agreement  between  model  and  data  since  aspects  of 
pattern  agreemen  t  and  magnitude  (bias)  are  given  equal  weight 
and  one  may  clearly  visualize  how  they  each  contribute  to  the 
total  RMSD. 

It  would  be  inappropriate,  however,  to  suggest  that  skill 
assessment  must  always  be  implicitly  synonymous  with  finding 
the  lowest  RMSD  value  amongst  an  ensemble  of  model  results 
or  an  acceptably  low  RMSD  values  for  a  single  model  result.  A 
potential  deficiency  in  both  the  Taylor  and  target  diagrams 
stems  directly  from  a  peculiarity  of  the  RMSD  metrics:  the 
RMSD  values  may  improve  for  correlations  less  than  unity 


(R<  1.0)  where  the  normalized  standard  deviation  is  equal  to  the 
correlation  (o*-R)  instead  of  an  ideal  value  of  one  (t?*«  1.0). 

Another  way  to  conceive  of  this  behavior:  if  the  correlation 
between  a  modeled  and  observed  field  is  imperfect,  i.e„  in  some 
areas  the  modeled  values  increase  where  or  when  the  observed 
values  decrease,  then  the  average  magnitude  of  this  misfit  may 
be  reduced  by  diminishing  the  observed  field’s  variance  (as¬ 
suming  the  bias  is  not  a  significant  source  of  mismatch).  For 
example,  suppose  a  three-dimensional  coupled  model  of  phy¬ 
toplankton  growth  and  ocean  circulation  appears  to  adequately 
reproduce  the  observed  details  of  chlorophyll  patterns  within  a 
mesoscale  eddy,  only  the  eddy  is  in  the  wrong  location  when 
compared  to  the  observations  (a  common  type  of  mismatch  for 
coupled  models  since  modeled  velocity  fields  are  imperfect  and 
advection  is  a  time-integrative  process).  Given  this  spatial 
mismatch,  the  RMSD-based  metrics  of  model/data  misfit  may 
improve  if  the  details  (i.e.,  the  variance)  of  the  modeled 
chlorophyll  field  are  diminished  or  smoothed  over.  Would  the 
investigator  prefer  a  blurred  modeled  field  over  the  one  where 
the  exclusive  source  of  model/data  disagreement  appears  to  be 
dislocation? 

This  circumstance  may  be  clearly  demonstrated  using 
satellite  ocean  color  patterns  from  areas  of  complex  mesoscale 
variability,  such  as  Moderate  Resolution  Imaging  Spectro- 
radiometer  data  for  the  Mozambique  Channel  off  the  south¬ 
west  coast  of  Madagascar  (Fig.  15A).  The  complex  pattern  of 
apparent  surface  chlorophyll  within  mesoscale  eddies  and 
fronts  (Fig.  15A)  may  potentially  be  mimicked  by  a  coupled 
model,  but  imperfectly  so  with  respect  to  spatiotemporal 
agreement.  We  approximate  this  kind  of  disagreement  by 
reversing  the  array  order  (Fig.  15B)  such  that  the  hypothetical 
modeled  field  is  effectively  a  mirror  image  of  the  data.  The 
means  and  variances  of  the  two  fields  are  identical,  but  the 
correlation  between  them  is  quite  low  (R  =  0.09)  and  this 
results  in  high  RMSD  scores  (RMSD*'-RMSD*«  1.35).  These 
scores  may  be  artificially  improved  by  simply  reducing  the 
variance  of  the  hypothetical  model  field  (Fig.  15C)  until  the 
threshold  criterion  <J*-Ris  met.  As  a  result  of  this  procedure, 
complex  spatial  details  of  the  modeled  chlorophyll  field  have 
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Fig.  14.  Summary  target  diagram  lor  all  three  lypes  of  model  to  reference  Held  comparisons:  chlorophyll  a  (black),  phytoplankton  absorption  (violet),  and  COM 
absorption  (red).  The  dashed  lines  indicate  the  estimated  observational  uncertainty  (OU)  threshold  (corresponding  to  the  field  color). 


been  significantly  diminished  (Fig  15B  and  C)  yet  the  RMSD 
scores  have  certainly  improved  (RMSD*  =  0.99).  Another  way 
to  demonstrate  this  property  of  RMSD-based  metrics  is  to 
begin  with  the  original  field  (Fig.  15A)  and  simply  apply  a  large 
smoothing  filter  (Fig.  15D).  Of  the  three  hypothetical  modeled 
fields  (Fig.  15B,C,  and  D),  one  may  be  inclined  to  select  B  as  the 
most  skillful,  though  RMSD  scores  run  contrary  to  this 
inclination. 

Thus  there  are  indeed  cases  where  a  distinction  may  be 
appropriately  made  between  reducing  RMSD  statistics  and 
increasing  model  skill  An  alternative  skill  scoring  system  and 
skill  target  diagram  was  developed  and  presented  for  such  a 
contingency.  The  advantage  of  this  system  is  that  for  R<1.0 
the  minimum  value  skill  score  instead  occurs  where  (J4-  1.0. 
In  our  example,  the  53  skill  score,  Eq.  ( 18),  would  indicate  that 
field  (B)  is  indeed  the  most  skillful  (Fig.  15).  There  are 
potentially  many  other  creative  ways  to  combine  correlations, 
variances,  and  other  metrics  into  composite  skill  scores  that 
have  properties  distinctly  different  from  RMSD-based  met¬ 
rics.  Our  intent  is  not  to  promote  a  specific  solution  but, 


rather,  to  point  out  that  a  contradiction  may  arise  between 
minimum  RMSD  scores  and  other  potential  definitions  of 
model  skill. 

In  summary,  model  skill  assessment  ultimately  requires 
specification  about  which  quantitative  metrics  should  be 
applied  and  how  they  should  be  interpreted  to  constitute 
“good"  or  “bad"  model  performance.  The  "skill”  portion  of 
skill  assessment  may  be  mathematically  defined,  but  the 
"assessment"  will  invariably  rely  upon  the  value  judgments  of 
the  investigator.  Our  analysis  has  focused  upon  some  widely 
known  statistical  quantities  (linear  correlation  coefficients, 
means,  and  variances)  and  ways  that  they  may  be  combined 
mathematically  and  graphically  to  describe  RMSD-based 
measures  of  model/data  misfit.  Taylor  diagrams  are  polar 
coordinate  plots  that  focus  upon  pattern  agreement,  whereas 
the  target  diagrams  developed  here  summarize  both  the 
aspects  of  pattern  agreement  and  magnitude  (bias)  and  how 
they  each  contribute  to  the  total  RMSD,  a  common  metric  of 
overall  model/data  agreement.  Investigators  should  be  cog¬ 
nizant  of  the  aspects  of  model  performance  summarized  by 
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R  =  0.09:  o *  =  0.09;  RMSD*  =  0.99:  S3  =  0.99  R  =  0.72:  o*  =0  58:  RMSD*  =  0.70:  S3  =  0.66 

Fig.  15.  A  pattern  of  ocean  color  data  is  shown  in  panel  A  (surface  chlorophyll  fields;  Moderate  Resolution  Imaging  Spectroradiometer  image  2S  July  2007;  data 
provided  by  NASA  from  their  website  at  http://oceancnlor.gsfc.nasa.gov/).  To  make  a  hypothetical  model  field  wherein  the  misfit  arises  exclusively  from  spatial 
incoherence,  the  data  array  in  (A)  was  reversed  and  is  shown  in  panel  (B)  as  a  hypothetical  modeled  field.  The  resulting  correlation  is  low  but  the  mean  and  variance 
are  the  same.  The  field  in  panel  (B)  was  further  manipulated  so  that  the  normalized  standard  deviation  (<r*)  is  equal  to  the  correlatinn  coefficient  (tr*-R). This  field 
is  shown  in  panel  (C).  As  a  final  comparison,  the  field  in  panel  (A)  was  smoothed  using  a  moving  average  filter.  The  correlation  (K),  normalized  standard  deviation 
(<>*),  normalized  total  root-mean-squaredifference  (RMSD*),  and  skill  score(S3)are  shown  beneath  each  panel  for  the  comparison  to  the  reference  field  (A).  Panel 
(D)  has  the  lowest  RMSD*  score  and  panel  (B)  has  the  lowest  skill  score. 


each  of  these  aforementioned  statistical  and  graphical  ap- 
i caches  before  making  claims  of ’‘model  validation.”  Further¬ 
more,  both  methods  presume  that  RMSD-based  metrics  are 
sufficient  criteria  upon  which  to  base  model  skill  assessments, 
and  this  may  not  always  be  the  case. 
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